testingiauh912fandomcom-20200214-history
Validity and the Social Dimension of Language Testing -Sakineh Yeganegi
'' '' '' ''Validity and The Social Dimension of Language Testing a) Cronbach b) Messick b1) MessicK in construct validity b2) Construct definition and validation (Mislevy) The extent to which the inferences of decisions we make on the basis of test scores are meaningful, appropriate, and useful. In other words, a test is said to be valid to extent that it measures what it is supposed to measure or can be for which it is intended. Validity classifies into different types, Content validity, Criterion validity, and Construct validity. Validity according to Messick (1989) describes as ‘ an integrated evaluation judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and action ''based on test scores’. (Bachman. 1990, p.236). In other hand based on the definition of American Psychological Association (1985:9) validity is a unitary concept. Validity always refers to the degree to which that evidence support the interference that are made from the scores. The inference regarding specific uses of a test are validated, not the test itself. ( Bachman, 1990. p. 237) Validity and validation are discussed in different aspects in Testing and assessment as they include: *theoretical principles of validity *test development and operationalization *validation procedures *testing/assessment research *social dimensions of validity and fairness *validity in classroom assessment '''Validity and the Social Dimension of Language Testing' Contemporary validity theory has developed procedures for supporting the rationality of decisions based on tests and has thus addressed issues of test fairness. Also validity theory develop ways of thinking about the social dimensions of the use of tests, many issues are still unresolved. a) 'Cronbach' Discussions of validity in educational assessment are heavily influenced by the thinking of the American Lee Cronbach, that calls him the ‘father’ of construct validity. Cronbach was part of the American Psychological Association’s Committee on Psychological Tests, which met between 1950 and 1954 to develop criteria of test quality. The term Construct validity coined by Meehl and Challman. Then Cronbach and Meehl (1955) develop the theory in detail in their classic article in the Psychological Bulletin. Cronbach and Meehl supported their new concept of construct validity as an alternative to criterion-related validity: According to Mc Nammara and Rover (2006. p 11) “Construct validity is ordinarily studied when the tester has no definite criterion measure of the quality with which he sic is concerned and must use indirect measures. Here the trait or quality underlying the test is of central importance, rather than either the test behavior or the scores on the criteria. (Cronbach & Meehl, 1955, p. 283)” There is also clear recognition that validity is not a mathematical property like discrimination or reliability, but a matter of judgment. Cronbach emphasized the reason of necessity of validity that focuses on collecting evidence for or against a certain interpretation of test scores. Cronbach express that there is no such thing as a “valid test” only be able to explain interpretations: “One does not validate a test, but only a principle for making inferences” (Cronbach&Meehl, 1955, p. 297); “One validates not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971, p. 447). Cronbach and Meehl in 1955 distinguished weak and strong program for construct validation. The weak is a fairly haphazard collection of any sort of evidence that supports the particular interpretation to be validated. The strong program is based on the falsification idea advanced by Popperian philosophy (Popper, 1962): Rival hypotheses for interpretations are proposed and logically or empirically examined. (Mc Nammara and Rover, 2006. p.11). Cronbach in his writings on validation measurement never emphasized the importance of sociopolitical context on the whole testing enterprise, in contrast he emphasized that evaluations are sites of political difference and clashes of values. Cronbach mentioned the role of beliefs and values in validity arguments, which “must link concepts, evidence, social and personal consequences and values” ( Cronbach, 1988, p.4). (Mc Nammara and Rover, 2006. p.11). Also, he expressed that all interpretation involves questions of values: A persuasive defense of an interpretation will have to combine evidence, logic, and rhetoric. What is persuasive depends on the beliefs in the community. (Cronbach, 1989, p. 152) Cronbach agree with Messick that the duty of validity is consider test consequences and help prevent negative ones. Cronbach distinguished that judgments of positive or negative results depend on social views of what is a desirable result, but views and values change over time. Then there is a concern for social results as a kind of corrective to an earlier only cognitive and individualistic way of thinking about test. Cronbach’s difficulty in accepting on construct validity and his concern for social and political values has remained characteristic of both the fields of educational measurement and language testing. This difficulty in accepting these concerns formed the characteristic of the Messick work. b) 'Messick' Based on Cronbach’s difficulty, theory of validity developed by Samuel Messick. Messick combined asocial dimension of assessment completely clear through his model. Messick like Cronbach saw assessment as a process reasoning and evidence gathering function in order for conclusions to be made about individuals and saw the task of establishing the meaningfulness and defensibility of those inferences as being the primary task of assessment development and reset arch. Messick introduced the social more clearly by arguing two things: That our understanding of what it is that we are measuring and things we rank according to measurement, will reflect values, which we can express will be social and cultural in origin, and that tests have real effects in the educational and social contexts in which they are used and that these need to be matters of concern for those responsible for the test. Messick by''' these aspects of validity set out a unified theory of validity (Figure 2.1) (Mc Namara and Rover, 2006. p,13) In Messick’s theory theoretical account of aspects of social dimension of assessment based on (Figure 2.2) make clear. In this model, in the bottom two cells of Matrix. One addresses the social and cultural character of meanings attributed to test scores, The other one the real-world consequences of the practical use of test. '' '' ' '' Figure 2.1. Facets of validity. From Messick (1989, p. 20). Messick presented a unified and expanded theory of validity, which included the evidential and consequential bases of test interpretation and use. Table 1 shows how this theory works. Notice that the evidential basis for validity includes both test score interpretation and test score use. The evidential basis for interpreting tests involves the empirical study of construct validity, which is defined by Messick as the theoretical context of implied relationships to other constructs. The evidential basis for using tests involves the empirical investigation of both construct validity and relevance/utility, which are defined as the theoretical contexts of implied applicability and usefulness. Messick attention on the need to investigate the overtly social dimension of assessment, specially the results of test use, has been disapproval. Messick’s thinking on construct validity in greater detail interpreted by two leading theories: Mislevy and Kane. '' '' ''Figure 2.2. ''Understanding Messick’s validity matrix. 'Messick on Construct Validity' According to Messick’s Figure , cell 1, represents the process of construct definition and validation. Those claims then provide the rationale for making decisions about individuals (cells 2 and 4) on the basis of test scores. Supporting the adequacy of claims about an individual with reasoning and evidence and demonstrating the relevance of the claims to the decisions we wish to make are fundamental to the process. Decisions in such cases are based on a belief about whether the person concerned will be able to cope communicatively once admitted and about the demands that they are likely to make on the institutional setting in terms of support, risk of failure, adjustment to routine procedures, and so on. Deciding whether the person should be admitted then depends on two prior steps: modeling what you believe the demands of the target setting are likely to be and predicting what the standing of the individual is in relation to this construct. Clearly, then, both the construct (what we believe “coping communicatively” in relevant settings to mean) and our view of the individual’s standing are matters of belief and opinion, and each must be supported with reasoning and evidence before a defensible decision about the individual can be made. The test is a procedure for gathering evidence in support of decisions that need to be made and interpreting that evidence carefully. It involves making some observations and then interpreting them in the light of certain assumptions about the requirements of the target setting and the relationship of the evidence to those assumptions. The relationships among test, construct and target are set out in Figure 2.3. ''Figure 2.3. ''Target, construct, and test as the basis for inferences leading to decisions. Tests and assessments thus represent systematic approaches to constraining these inferential processes in the interests of guaranteeing their fairness or ''validity. Validity therefore implies considerations of social responsibility, both to the candidate (protecting him or her against unfair exclusion) and to the receiving institution and those (in the case of health professionals) whose quality of health care will be a function in part of the adequacy of the candidate’s communicative skill. Fairness in this sense can only be achieved through carefully planning the design of the observations of candidate performance and carefully articulating the relationship between the evidence we gain from test performance and the inferences about candidate standing that we wish to make from it. Test validation steers between the Scylla and Charybdis of what Messick called construct under representation, on the one hand, and construct-irrelevant variance, on the other. The former warns of the danger that the assessment requires less of the test taker than is required in reality. We will give examples of this later. The latter warns that differences in scores might not be due only to differences in the ability being measured but that other factors are illegitimately affecting scores. Construct Definition and Validation: Mislevy Mislevy and his colleagues provides analytic clear expression to the procedures involved in designing tests. Based on (Figure 2.4) ( Mc Namara and Rover, 2006. p. 19) Mislevy calls this the “ assessment argument’’. This is needed to set up the relevance of assessment data and its value as evidence. According to (Mc Namara and Rover, 2006. p. 19) Mislvey defined assessment argument as : An assessment is a machine for reasoning about what students know, can do, or have accomplished, based on a handful of things they say, do, or make in particular settings. (Mislevy, Steinberg, & Almond, 2003, p. 4) Figure 2.4. ''The assessment argument. Mislevy has developed an approach called Evidence Center Design ( Figure 2.5), that focuses on the chain of reasoning in designing tests. This approach tries to setup clear relationship between the claims we wish to be able to make about test candidates and the evidence which claims are based. The first stage called “''Domain Analysis” involves what in performance assessment is traditionally called job analysis. The second stage is Domain Modeling. It includes modeling three things: claims, evidence, and tasks. The modeling of Claims and evidence is equivalent to express the test construct. Step 1 involves test designer in expressing the claims the test will make about candidates on the basis of test performance. Mislevy looks out the purposes that tests are to serve, including the decision that they are to support, as influencing the formulation of claims—yes/no decisions, as in decisions about admission, are different from pedagogic decisions about what features of language to focus on in remedial work. Figure 2.5. ''Evidence Centered Design Step 2 involves determining the kind of ''evidence ''that would be necessary to support the claims established in step 1. This stage of test design clearly depends on a theory of the characteristics of a successful performance—in other words, a construct that will also typically be reflected in the categories of rating scales used to judge the adequacy of the performance. Step 3 involves defining in general terms the kinds of ''task ''in which the candidate will be required to engage so that the evidence set out in step 2 might be sought. All three steps precede the actual writing of specifications for test tasks; they constitute the “thinking stage” of test design. Only when this chain of reasoning is completed can the specifications for test tasks be written. Misley in other stages of Evidence Centered Design deal with turning the conceptual framework developed in the domain modeling stage into an actual assessment and ensuring that the psychometric properties of the test provide evidence in support of the ultimate claims that we wish to make about test takes. He proposes a series of statistical models in which test data are analyzed to support (or challenge) the logic of the assessment. The discussion here is highly technical and deals in particular with the requirements in terms of measurement, scoring, and logistics for an assessment to implement the relationships in the domain modeling. The final outcome of this is an operational assessment. .Mislevy’s conceptual analysis is impressive. Note, however, that its consideration of thesocial dimension of assessment remains implicit and limited to issues of fairness (McNamara,2003). Mislevy deal directly with the uses of test scores, the decisions for which they form the basis, except insofar as they determine the formulation of relevant claims, which, in any case, is taken as a given and is not problematized. '''Conclusion ' Here considered the nature and main concerns of current validity theory—in particular, the influential model of Messick and its interpretation within both educational measurement and language testing. Also seen the struggle with which field has faced the necessity to incorporate questions involving policy and values into what was initially the secure domain of traditional psychometric considerations and constraints, and the social function of tests in aspects of language testing.